Transferring data from integer to vector registers

ABSTRACT

A method for transferring data from a general purpose register to a vector register, the method including splatting a byte of data directly from a general purpose register (GPR) to a vector register (VR) by means of vector permute instructions, and splatting another byte of data from the GPR to the VR and vectorially combining the data in the VR.

FIELD OF THE INVENTION

The present invention relates generally to vector processing, and moreparticularly to transferring data directly from a general purposeregister to a vector register.

BACKGROUND OF THE INVENTION

Many microprocessors operate with Vector architectures and include aVector Processing Unit (VPU). Vector architectures enable simultaneousprocessing of many data items in parallel. Operations may be performedon multiple data elements by a single instruction—referred to as SingleInstruction Multiple Data (SIMD) parallel processing.

Many implementations of a VPU may use dedicated register files that aredisjoint from a General Purpose Register (GPR) file. There isaccordingly a need to transfer data from the GPR to a Vector Register(VR).

Prior art solutions for transferring data from the GPR to the VR may beclassified into three main approaches. The first approach stores datafrom a GPR to memory and then loads the data from the memory into a VR.An example of this approach is embodied in AltiVec. AltiVec (trademarkof Motorola, Inc.) is a high bandwidth, parallel operation vectorexecution unit developed as a SIMD extension to the PowerPC ISA(instruction set architecture). AltiVec is a vector architecture thatcan process multiple data streams/blocks in a single cycle. However,transferring data indirectly through memory has disadvantages. It istime consuming and can cause pipeline stalls.

A second approach provides explicit instructions to transfer datato/from the register files. Intel's MMX/SSE/SSE2/SSE3 technologiesemploy this solution. However, this has the disadvantage of addingadditional instructions to the architecture. While the additionalinstructions may be acceptable for a CISC (Complete Instruction SetComputer), they are undesirably limiting for a RISC (Reduced InstructionSet Computer).

A third approach has the vector and scalar registers share the samefile. In this manner the vector and scalar instructions access the samephysical register, eliminating the need to transfer data between them.This was the original implementation of Intel's MMX technology. However,it has the disadvantage of reducing the number of registers available tothe processor.

SUMMARY OF THE INVENTION

The present invention seeks to provide an improved method fortransferring data directly from a general purpose register or floatingpoint register (also referred to as an integer register, the terms beingused interchangeably throughout the specification and claims) to avector register, as is described more in detail hereinbelow.

In one embodiment of the invention, the method includes splatting a byteof data directly from the general purpose register (GPR) to a vectorregister (VR) by means of vector permute instructions, and splattinganother byte of data from the GPR to the VR and vectorially combiningthe data in the VR.

In accordance with a non-limiting embodiment of the invention, themethod may be carried out with the lvsl and lvsr instructions of thePowerPC Instruction Set Architecture (ISA). These instructions aremainly used to create permute masks for loading/storing misaligned data.The instruction takes the lowest 4 bits (nibble) of a GPR and writes itinto the first byte of a vector register, wherein the successive bytescontain the previous bytes value+1. These instructions are the only onesin the Altivec ISA that define the contents of a VR based on a GPR. Asis described more in detail hereinbelow, by manipulating theseinstructions it is possible to transfer data from the GPR to the VRwithout having to use memory as a media, and without adding a specific,explicit, data transfer instruction.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood and appreciated more fully fromthe following detailed description taken in conjunction with theappended drawings in which:

FIG. 1A is a simplified block diagram illustration that shows how vectorprocessing load instructions may be used to insert the lowest 4 bits ofinteger registers into 16 bytes of a resulting vector register;

FIG. 1B is a simplified block diagram illustration that shows how avector processing instruction (in AltiVec) takes a vector register andindex and copies the value in that index across a result register;

FIG. 2 is a simplified flow chart of a method for transferring datadirectly from a general purpose register to a vector register inaccordance with an embodiment of the present invention, wherein fourLeast Significant Bytes (LSBs) of data are splat into a vector register,and then the whole character is splat into the vector register byshifting a high nibble into a low nibble and combining vector results;and

FIG. 3 is a simplified flow chart of a faster method for transferringdata directly from a general purpose register to a vector register inaccordance with another embodiment of the present invention, wherein thewhole character is splat into the vector register.

DETAILED DESCRIPTION OF EMBODIMENTS

The present invention implements existing instructions used with VectorProcessing Units (VPUs), particularly for VPUs that operate with SingleInstruction Multiple Data (SIMD) parallel processing, in order totransfer data directly from a general purpose register (GPR) to a vectorregister (VR) without going through a memory in between. Forconvenience, the invention will be described hereinbelow withinstructions used in the AltiVec parallel operation vector executionunit. However, the invention is not limited to the instruction set ofAltiVec, and the invention can be carried out with other VPUs andinstruction sets.

The parallel processing capability of AltiVec may include vector permuteoperations. Some of the instructions for performing permute operationsare the lvsl and lvsr instructions of the PowerPC Instruction SetArchitecture (ISA). The lvsl and lvsr instructions are loadinstructions, and they respectively stand for “load vector for shiftleft” and “load vector for shift right”. The format of the instructionsis as follows:

lvsl vD,rA,rB (and similarly lvsr vD,rA,rB)

wherein vD is the resulting vector register and rA, rB are integerregisters.

The lvsl and lvsr instructions are used to create permute masks forloading or storing unaligned (alternatively referred to as misaligned)data. Specifically, they calculate a “shift permutation vector” for usewith unaligned data. These instructions take the lowest 4 bits (nibble)of a GPR (calculated as an index from rA and rB) and write the nibbleinto the first byte of a vector register. The successive bytes containthe previous byte values plus 1. The lvsl and lvsr instructions may beused with a “vperm” instruction to format the data, based upon thenibble. The vperm instruction allows swapping the bytes in a vectorregister based upon another vector register that contains the requiredorder (permutation) of the bytes. For example, a combination of the lvsland lvsr instructions together with the vperm instruction may be used toread in two sets of 16 bytes and then extract the middle 16 bytes.

FIG. 1A is a simplified illustration that shows how the lvsl or lvsrinstruction inserts the low nibble of the integer registers rA+rB intothe 16 bytes of the resulting vector register vD.

The lvsl and lvsr instructions are the only ones in the Altivec ISA thatdefine the contents of a VR based on a GPR.

These instructions may be used to “splat” (that is, copy into everyitem) a scalar data value across a vector register. In AltiVec, this isusually performed with the so-called vec_splat intrinsic instruction,which takes a vector register and index and copies the value in thatindex across the result register, as shown in FIG. 1B.

The following code sequence is an example of instructions for splattinga scalar data value across a vector register, using AltiVec instructionterminology and nomenclature: achar tchar = (char)c; /* copy data intoan aligned-on-16-byte address */ vChar = vec_lde(0,(unsignedchar*)&tchar); /* load scalar from memory into a vector register */vChar = vec_splat(vChar,0); /* splat the data */

As mentioned before, the present invention provides a method fortransferring data directly from a general purpose register (integerregister) to a vector register. In one non-limiting embodiment of theinvention, a set of instructions are provided for splatting a byte valuein a GPR into a VR, as is now explained with reference to FIG. 2.

In a simplified embodiment of the invention, the four Least SignificantBytes (LSBs) of a char (data from the GPR) may be splat into a vectorregister (using AltiVec instruction terminology and nomenclature):

v1=lvsl(r)−lvsl(0)

An example of C code that performs this (assuming that c is in the lowernibble) is:  vAlign = vec_lvsl(0,(unsigned char *)0); /* create a vector 0,1,2,...15 */ (step 201)  ptr = (unsigned char*)c; /* cast the valueinto a pointer */ (step 202)  vChar = vec_lvsl(0,(unsigned char *)ptr);/* create a vector c,c+1,c+2, ... c+15 */ (step 203)  vChar =vec_sub(vChar,vAlign); /* splat the low nibble into the low nibbles ofvChar */ (step 204)

To splat the whole character into a vector, one may shift the highnibble of c into the low nibble, use lvsl, and then combine both vectorresults (step 205):

v1=lvsl(r)−lvsl(0)

v2=lvsl(r>>4)−lvsl(0)

v3=v2<<4|v1 (or add them together).

The invention, of course, is not limited to the above code that splatsthe 4 LSB into the VR. Rather the invention encompasses other methodsfor splatting the whole character into the VR, an example of which isnow explained with reference to FIG. 3.

An example of the C code that copies the value in character c to thevector vChar is the following:   vAlign = vec_lvsl(0,(unsigned char*)0); /* create a vector   0,1,2,...15 */ (step 301)   sval =vec_splat_u8(4); /* create a shift value register */ (step 302)   ptr =(unsigned char*)c; /* cast the value into a pointer */ (step 303)  vChar = vec_sub(vec_lvsl(0,(unsigned char *)ptr),vAlign); /* splat thelow nibble into the low nibbles of vChar */ (step 304)   ptr = (unsignedchar *)(c >> 4); (step 305)   vTemp = vec_sub(vec_lvsl(0,(unsigned char*)ptr),vAlign); /* splat the high nibble into the low nibble of vTemp(first vector result)*/ (step 306)   vTemp = vec_sl(vTemp,sval); /*shift the low nibbles of vTemp into the high nibbles (second vectorresult)*/ (step 307)   vChar = vec_or(vChar,vTemp); /* OR together both  nibbles */ (step 308)

The latter code is longer, nevertheless, it is much faster. In testing,when compiled using xlc 7.0 with the flags -O3 -qaltivec -qarch=ppc970-q64 and then executed on a PowerPC 970 processor, a speedup of 1.7 wasobtained.

An even faster method for splatting the whole character into the VR maybe obtained with the following optional instructions that follow step303: vChar = vec_lvsl(0, unsigned char *)ptr); (step 309) ptr =(unsigned char *)(c >> 4); (step 305) vChar = vec_lvsl(0, unsigned char*)ptr); (step 310) vTemp = vec_sl(vTemp,sval); (step 307) vChar =vec_or(vChar,vTemp); (step 308) vChar = vec_splat(vChar,0); (step 311)

The sub instructions and the vec_lvsl of 0 (steps 304 and 306) have beenomitted, while a vec_splat (step 311) has been added.

The splat operation has significant importance in many applications. Forexample, a vectorizing strchr function—strchr(str,c) returns theposition of the character c in string str or 0 if it does not exist.Another use is in pixel-blending applications where a char value used tomask two images must be copied across several vectors.

It is noted that the methods described herein may be carried out by acomputer program product 110, such as but not limited to, NetworkInterface Card, hard disk, optical disk, memory device and the like,which may include instructions for carrying out the methods describedherein.

The description of the present invention has been presented for purposesof illustration and description, and is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the art. Theembodiment was chosen and described in order to best explain theprinciples of the invention, the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated.

1. A method for transferring data from a general purpose register to avector register, the method comprising of: splatting a byte of datadirectly from a general purpose register (GPR) to a vector register (VR)by means of vector permute instructions; and splatting another byte ofdata from the GPR to the VR and vectorially combining the data in theVR.
 2. The method according to claim 1, further comprising shiftingbytes of data in the VR with vector permute instructions prior tosplatting further bytes of data from the GPR to the VR.
 3. The methodaccording to claim 1, wherein said vector permute instructions compriseinstructions used for Single Instruction Multiple Data parallelprocessing.
 4. The method according to claim 1, wherein said vectorpermute instructions comprise “load vector for shift left” (lvsl) and“load vector for shift right” (lvsr) instructions of a PowerPCInstruction Set Architecture (ISA).
 5. The method according to claim 1,wherein splatting bytes of data from the GPR to the VR comprisessplatting four Least Significant Bytes (LSBs) of data (nibble) from theGPR into the VR.
 6. The method according to claim 5, further comprisingsplatting low nibbles of data into the VR to obtain a first vectorresult, shifting high nibbles of data into the low nibbles to obtain asecond vector result, and vectorially combining both vector results. 7.The method according to claim 1, wherein splatting bytes of data fromthe GPR to the VR comprises: splatting a low nibble into low nibbles ofthe VR; splatting a high nibble into the low nibble of the VR to obtaina first vector result; shifting low nibbles into high nibbles to obtaina second vector result; and combining both vector results into the VR.8. The method according to claim 7, further comprising before splattingthe low nibble into low nibbles of the VR: creating a vector value 0, 1,2, . . . 15; creating a shift value register; and casting the vectorvalue into a pointer.
 9. The method according to claim 7, whereincombining both vector results comprises OR'ing together said vectorresults.
 10. A computer program product comprising: instructions forsplatting a byte of data directly from a general purpose register (GPR)to a vector register (VR) by means of vector permute instructions; andinstructions for splatting another byte of data from the GPR to the VRand vectorially combining the data in the VR.
 11. The computer programproduct according to claim 10, further comprising instructions forshifting bytes of data in the VR with vector permute instructions priorto splatting further bytes of data from the GPR to the VR.
 12. Thecomputer program product according to claim 10, wherein said vectorpermute instructions comprise instructions used for Single InstructionMultiple Data parallel processing.
 13. The computer program productaccording to claim 10, wherein said vector permute instructions comprise“load vector for shift left” (lvsl) and “load vector for shift right”(lvsr) instructions of a PowerPC Instruction Set Architecture (ISA). 14.The computer program product according to claim 10, wherein theinstructions for splatting bytes of data from the GPR to the VR compriseinstructions for splatting four Least Significant Bytes (LSBs) of data(nibble) from the GPR into the VR.
 15. The computer program productaccording to claim 14, wherein the instructions for splatting bytes ofdata from the GPR to the VR comprise instructions for splatting lownibbles of data into the VR to obtain a first vector result,instructions for shifting high nibbles of data into the low nibbles toobtain a second vector result, and instructions for vectoriallycombining both vector results.
 16. The computer program productaccording to claim 10, wherein instructions for splatting bytes of datafrom the GPR to the VR comprise: instructions for splatting a low nibbleinto low nibbles of the VR; instructions for splatting a high nibbleinto the low nibble of the VR to obtain a first vector result;instructions for shifting low nibbles into high nibbles to obtain asecond vector result; and instructions for combining both vector resultsinto the VR.
 17. The computer program product according to claim 16,further comprising before the instructions for splatting the low nibbleinto low nibbles of the VR: instructions for creating a vector value 0,1, 2, . . . 15; instructions for creating a shift value register; andinstructions for casting the vector value into a pointer.
 18. Thecomputer program product according to claim 16, wherein instructions forcombining both vector results comprise instructions for OR'ing togethersaid vector results.